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Abstract 

For a binary-input memoryless symmetric channel W , we consider the asymptotic behavior of the polarization 
process in the large block-length regime when transmission takes place over W . In particular, we study the 
asymptotics of the cumulative distribution P(Z„ < z), where {Zn} is the Bhattacharyya process defined from 
W, and its dependence on the rate of transmission. On the basis of this result, we characterize the asymptotic 
behavior, as well as its dependence on the rate, of the block error probability of polar codes using the successive 
cancellation decoder. This refines the original bounds by Arikan and Telatar. Our results apply to general polar 
codes based on I x I kernel matrices. 

We also provide lower bounds on the block error probability of polar codes using the MAP decoder. The 
MAP lower bound and the successive cancellation upper bound coincide when 1 = 2, but there is a gap for 
t>2. 

I. Introduction 

A. Polar Codes 

Polar codes, introduced by Arikan [ 1 ] , are a family of codes that provably achieve the capacity of binary-input 
memoryless symmetric (BMS) channels using low-complexity encoding and decoding algorithms. Since their 
invention, there has been a large body of work that has analyzed (see e.g., [2]-[ll]) and extended (see e.g., 
[12] -[20]) these codes. 

The construction of polar codes is based on an £ x £ matrix G, with entries in {0, 1}, called the kernel matrix. 
Besides being invertible, the matrix G should have the property that none of its column permutations is upper 
triangular [13]. We call a matrix G with such properties a polarizing matrix and in the following, whenever 
we speak of a kernel matrix G, we assume that G is polarizing. 

The material in this paper was presented in part in [6], [7], [8] and [11]. 
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The rows of the generator matrix of a polar code with block-length iV = are chosen from the rows of the 
matrix 

n 

G®" = 'G®G®---®G, 

where ® denotes the Kronecker product. For the case £ = 2 and the choice G = [J i], Reed-Muller (RM) 
codes also fall into this category. However, the crucial difference between polar codes and RM codes lies in 
the choice of the rows. For RM codes, the rows of the largest weights are chosen, whereas for polar codes the 
choice is dependent on the channel and is made using a method called channel polarization. We briefly review 
this method and explain how polar codes are constructed from it. We also refer the reader to [1], [5] and [13] 
for a detailed discussion. 

B. Channel Polarization 

Let be a BMS channel, and let X = {0, 1} denote its input alphabet, y the output alphabet, and W{y \ x) 
the transition probabilities. Let I{W) G [0, 1] denote the mutual information between the input and output of 
W with uniform distribution on the input. The capacity of a BMS channel W is equal to I{W). Also, the 
Bhattacharyya parameter of W, denoted by Z{W), is defined as 

ZiW) = J2 VW{y\0)Wiy\l). 

It provides upper and lower bounds of the error probability Pe{W) in estimating the channel input x on the 
basis of the channel output y via the maximum-likelihood (ML) decoding of W{y\x) as follows [22, Chapter 
4], [5]. 

i (l - v/1 - Z{Wr) < Pe{W) < \z{W). (1) 
It is also related to the capacity I{W) via 

Z{W)+I{W) > 1, 

[z{w)r + [i{w)r < 1, 

both proved in [1]. 

The method of channel polarization is defined as follows. Take N = copies of a BMS channel W . Combine 
them by using the kernel matrix G to make a new set of £" channels {W^l} } i<i<iri . The construction of these 
channels is done by recursively applying a transform called channel splitting. Channel splitting is a transform 
which takes a BMS channel W as input and outputs I BMS channels W\ < j < £ — 1. The channels 
are constructed according to the following rule: Consider a random row vector J/q^ = (?7o, ■ • • , Ui-i) that 
is uniformly distributed over {0,1}^ Let X^'^^ = Uq^^G, where the arithmetic is in GF(2). Also, let Y^^^ 
be the output of £ uses of W over the input X^^^. We define the channel between U^"^ and Y,^^^ by the 
transition probabilities 

Wiiyf I " n ^(y^ I ^ ti ^(y^ I (2) 

j=0 1=0 
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The channel : {0,1} x {0,1}^ is defined as the BMS channel with input uj, output (j/q \ u;^ ^) 

and transition probabiUties 

W^i/,-\u^'\u,) = ^ E Wdyt'\<-')- (3) 

^3 + 1 

Here and hereafter, ul denotes the sub vector (u^, . . . , Uj). 

The construction of the channels {w'll}}i<i<tn can be visualized in the following way [1]. Consider an 
infinite £-ary tree with the root node placed at the top. To each vertex of the tree, we assign a channel in a way 
that the collection of all the channels that correspond to the vertices at depth n equals {W^i}}i<i<ii^. We do 
this by a recursive procedure. Assign to the root node the channel W itself. From left to right, assign to 
W^~^ to the children of the root node. In general, if Q is the channel that is assigned to vertex v, we assign 
to Q^"^, from left to right respectively, to the children of the node v. There are £" vertices at level n in 
this i-siy tree. Assume that we label these vertices from left to right from 1 to Let the channel assigned to 
the ith vertex, 1 < i < be W^.,) . Also, let the £-ary representation of z — 1 be 6162 ■ ■ 'bn, where 61 is the 
most significant digit. Then we have 

As an example, assuming i = 7, n = 3 and £ = 2 we have M^g''-' = 

The channels {W^'J} i<i<e" have the property that, as n grows large, a fraction close to I{W) of the channels 
have capacity close to 1 (or Bhattacharyya parameter close to 0); and a fraction close to 1 — I(W) of the channels 
have capacity close to (or Bhattacharyya parameter close to 1). The basic idea behind polar codes is to use 
those channels with capacity close to 1 for information transmission. Accordingly, given the rate R < I{W) 
and block-length = the rows of the generator matrix of a polar code of block-length N correspond to a 
subset of the rows of the matrix G"^" whose indices are chosen with the following rule: Choose a subset of 
size NR of the channels {W^l^} i<i<£" with the least values for the Bhattacharyya parameter and choose the 
rows G®" with the indices corresponding to those of the channels. For example, if the channel W^^'' is chosen, 
then the jth row of G"^" is selected, where the £-aiy representation of j — 1 is the digit-reversed version of that 
of z — 1. We decode using a successive cancellation (SC) decoder. This algorithm decodes the bits one-by-one 
in a pre-chosen order that is closely related to how the row indices of G*^" are chosen. 

C. Problem Formulation and Relevant Work 

Let T be the set of indices of the NR channels in the set {W^l}}i<i<t^ with the least values for the 
Bhattacharyya parameter. Let '¥^^{N,R) and P^^^(Af, i?) denote the average block error probability of the 
SC and the maximum a-posteriori (MAP) decoders, respectively, with block-length N and rate R. For the SC 
decoder we have [1], [5], 

max \U- ■\Ii-Z{W^}y\ < Ff(N, R) <Y^ Z{wf^). (4) 

This relation evidently shows that the distribution of the Bhattacharyya parameters of the channels {VK/l^}i<i<£" 
plays a fundamental role in the analysis of polar codes. More precisely, for n e N = {0, 1,2,.. .} and 
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< z < 1, we are interested in analyzing the behavior of 

where denotes the number of elements of the set A. There is an entirely equivalent probabilistic description 
of (5): Define the "polarization" process [2] of the channel as a channel-valued stochastic process {W„}neN 
with Wa ~ W and 

Wn+l=W^", (6) 

where {Bn}nen is a sequence of independent and identically-distributed (i.i.d.) random variables with distri- 
bution P(i?o = i) ~ J for i E {0, 1, ...,£— 1}. In other words, the process begins at the root node of the 
infinite £-ary tree introduced above, and in each step it chooses one of the £ children of the current node with 
uniform probabihty. So at time n, the process {W„}n(EN outputs one of the channels at level n of the tree 
uniformly at random. The Bhattacharyya process {Z„}„gN of the channel W is defined from the polarization 
process as Z„ = Z{Wn)- In this setting we have 

P(Z„ <z) = F(n,z). (7) 

It was shown in [2] and [5] that the Bhattacharyya process {Z„}„gN converges almost surely to a {0, l}-valued 
random variable Zoo with P(Zoo = 0) = I{W). Our objective is to investigate the asymptotic behavior of 
^{Zn < z). The analysis of the process {Z„}„gN around the point z = is of particular interest, as this 
indicates how the "good" channels (i.e., the channels that have mutual information close to 1) behave. The 
asymptotic analysis of the process is closely related to the "partial distances" of the kernel matrix G: 

Definition 1 (Partial Distances): We define the partial distances DAG), i = 0, - ■ ■ , £ — 1, of an £ x ^ matrix 
G = 



gi-i 



(gi's are row vectors) as 



A(G') ^ dH({.g,}, . . . , i = 0,...,i-2, 

D,^,{G)^dH{{9i-i},m), 

where dni-, •) denotes the Hamming distance between two sets of binary sequences, and where {gi+i, ■ ■ ■ , gi-i) 
denotes the hnear space spanned by . . . , gi-i- The exponent of G is then defined as 



E{G)^\Y.^og,D,^G)^ 
and the second exponent of G is defined as 

V{G) = \Y.{\og,D,{G)~E{G)f. 

1=0 

In other words, the exponent E{G) and the second exponent V{G) are the mean and the variance of the 
random variable log^ Db{G), where i? is a random variable taking a value in {0, 1, 1} with uniform 

probabihty. It should be noted that the invertibility of G implies the partial distances {Di{G)} to be strictly 
positive, making the exponent E{G) finite. Note also that the condition for a matrix G to be polarizing, that 
none of column permutations of G is upper triangular, implies {Di{G)} to be strictly greater than 1, yielding 
E(G) to be strictly positive. 
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The following theorem partially characterizes the behavior of the process {Zn}neTi around z = 0. 
Theorem 2 ([2] and [5]): Let Vt^ be a BMS channel and assume that we are using as the kernel matrix an 
£ X £ matrix G with exponent E{G). For any fixed /3 with < /3 < E{G), 



lim P(Z„ < 2"^"") = I{W). 

n— >-oo 



Conversely, if I{W) < 1, then for any fixed /3 > E{G), 



lim P(Z„ > 2"^"") = 1. 



An important consequence of Theorem 2 is that, as the behavior of Ff^{N, R) when using polar codes with the 
kernel matrix G, of block-length N = and rate R < I{W) under SC decoding is asymptotically the same 
as that of max^gx Z{wj^'') from (4), the probability of error behaves as 2-^"^*'^'^°*"' as N tends to infinity. A 
noteworthy point about this result is that the asymptotic analysis of the probability of error is rate-independent, 
provided that the rate R is less than the capacity I{W). In this paper, we provide a refined estimate for 
IP(^n < z). Specifically, we derive the asymptotic relation between P(Z„ < z) and the rate of transmission R. 
From this we derive the asymptotic behavior of Ff^ {N, R) and its dependence on the rate of transmission. We 
further derive lower bounds on the error probability when we perform MAP decoding instead of SC decoding. 

An important point to mention here is that the results of this paper are obtained in the asymptotic limit of 
the block-length for any fixed rate value R. Considering the regime where R also varies with the block-length 
is a problem of different interest, for which we refer the reader to [21]. 

The outline of the paper is as follows. In Section II we state the main results of the paper In Section III we 
first define several auxiliary processes and provide bounds on their asymptotic behavior. Using these bounds, 
we then prove the main results. We discuss the implications of the proofs in selecting the set of channel indices 
in Section IV. It should be noted that in the following the logarithms are in base 2 unless explicitly stated 
otherwise. 



II. Main Results 

r 90 -. 



For a BMS channel W, let {Z„ 



Theorem 3: Consider an ^ x ^ polarizing kernel matrix G = 

Z{Wn)}neTi be the Bhattacharyya process of W. Let Q{t) = /j°°e-^'/2^z/V2^ be the error function and 
Q^^(-) be its inverse function. 

1) FotR<I(W), 

lim F Z„< 2"^ = R. 



2) Let H = [gJ-l^ ■ ■ ■ 7 ffj] ^ denotes the transpose) and assume that Di{H) < Di^i{H) for 1 < i < 
£ - 1. Then, for R' < 1 - I{W) we have, 

limP Z„>l-2-^ M nw); 

n— ^00 \ / 

Here, f{n) is any function satisfying f{n) = o{^/n). ■ 

Discussion: Theorem 3 characterizes the asymptotic behavior of V{Zn < z) and refines Theorem 2 in the 
following way. According to Theorem 2, if we transmit at rate R below the channel capacity, then the quantity 



October 6, 2011 



DRAFT 



6 



log^(-log(Pf (A^ = R))) scales like nE{G) + o(n). The first part of Theorem 3 gives one further term 
by stating that o{n) is in fact y/nViG)Q-^ {iW)) 

+ o{y/n). The second part of Theorem 3, on the other 
hand, characterizes the asymptotic behavior of P(Z„ < z) near z = 1, which is important in applications of 
polar codes for source coding [12]. Put together. Theorem 3 characterizes the scaling of the error probability 
of polar codes with the SC decoder. Similar results hold for the case of the MAP decoder. 

Theorem 4: Let be a BMS channel and let R < I{W) be the rate of transmission. Consider an £ x £ 
kernel matrix G with {wo{G), ■ ■ ■ , Wi^i{G)} the Hamming weights of its rows and define 

£-1 £-1 

EUG) ^ -Y,^og,w,{G), VUG) ^ -Y.^\og, w,{G) ~ E^{G))\ (8) 

j=0 i=0 

If we use polar codes of length N = ^" and rate R for transmission, then the probability of error under MAP 
decoding, ¥f^{N, R), satisfies 

log,(- log(Pf^P(^, i?))) < nE^G) + v/^^kMQ-i (t^^) + o(V^). (9) 

■ 

Discussion: Let G be according to Arikan's original construction [1], i.e., G = which is the only 

polarizing matrix for the case £ = 2. For this G, we have Wi{G) = Di{G) for i ~ and 1. Hence, the 
block error probability for the SC decoder and the MAP block error probability share the same asymptotic 
behavior according to Theorems 3 and 4. For a general £x £ matrix G, however, one may have strict inequality 
EyjiG) > E{G), in which case one still has an asymptotic gap between the error probability with SC decoding 
and the lower bound of MAP error probability. Whether or not this gap can be filled or made narrower is an 
open problem. 

III. Proof of the Main Result 

A. Preliminaries 

Let {Bn}nen be a sequence of i.i.d. random variables that take their values in {0, 1, • • • ,^—1} with uniform 
probability, i.e., P(i?o = .?) = j for j G {0, 1, . . . , £—1}. Let (fi, J^, P) denote the probability space generated 
by the sequence {_B„}„gN and let (r2„, Fn, Pn) be the probability space generated by {Bq, • • • , We now 
couple the polarization process { W„}nGN with the sequence {i3„}„eN via (6). Consequently, the Bhattacharyya 
process {Z„ = Z iWn)} is coupled with the sequence {_B„}„gN- By using the bounds given in [5, Chapter 
5] we have the following relationship between the Bhattacharyya parameters of and that of W: Recall that 
{Z)i(G)}o<t<c-i are the partial distances of the matrix G. We have [5] 

< Z{W'') < 2^-'Z{W)°''^'^\ (10) 

Also let H = [gj_j^, ■■■ , 5o ]'^- Assuming D,{H) < A-i(-ff), 

(1 - Z{W))°'^"^ < 1 - Z{W') < 2^'+\l - Z{W))°'^"\ (11) 
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B. Proof of Theorem 3 

We first provide an intuitive picture behind the result of Theorem 3. For simplicity, assume £ = 2 and let 
the channel be a binary erasure channel (BEC) with erasure probability e. The capacity of this channel is 
1 — e. For such a channel, the Bhattacharyya process has a simple closed form [1] as = e and 



We know from Section I-C that as n grows large, Z„ tends almost surely to a {0, l}-valued random variable 
Zoo with P(Zoo = 0) = 1 — e. The asymptotic behavior of {Z„} can be explained roughly by considering the 
behavior of {— log Z„}. In particular, it is clear from (12) that at time n + 1, — log Z„ is either doubled (when 
Bn = 0), or decreased by at most 1 (when Bn = 1). Also, observe that once — log Zn becomes sufficiently 
large, subtracting 1 from it has negligible effect compared with the doubling operation. Now assume that m 
is a sufficiently large number Conditioned on the event that — log is a very large value (or equivalently, 
the value of Z,„ is very close to 0: this happens with probability very close to 1 — e), for n > m the process 
{— \ogZn} evolves each time by being doubled if i?„ = or remaining roughly the same if Bn = 1. We can 
then use the central limit theorem to characterize the asymptotic behavior of {— logZ„} for m. 

The proof of Theorem 3 is done by making the above intuitive steps rigorous for a EMS channel W and 
a polarizing £ x £ kernel matrix G. In a slightly more general setting, we study the asymptotic properties of 
P{Xn < x) for any generic process {Xn}neti satisfying the conditions (cl)-(c4) defined as follows. 

Definition 5: Let 5 be a random variable taking values in [l,oo). Assume that the expectation and the 
variance of logS* exist and are denoted by EpogS*] and VpogS*], respectively. Assume that {S'„},igN are i.i.d. 
samples of S. Let {Xn G (0, l)}„gN be a random process satisfying the following conditions: 
(cl) There exists a random variable Xoo such that Xn Xoo holds almost surely. 



(c3) There exists a constant c > 1 such that Xn+i < cX„" holds. 
(c4) Sn is independent of X^ for m < n. 

The random processes {Z„}„gN and {1 — Z„}„gN satisfy the above four conditions by letting Sn = Db„{G) 
and Sn = Db„{H), respectively. The fact that these processes satisfy the condition (cl) has been proved in [5, 
Lemma 5.4], and the result reads that if G is polarizing, then Zoo takes only and 1, with probabilities I{W) 
and 1 — I{W), respectively. Conditions (c2) and (c3) also hold because of (10) and (11). 
Our objective now is to prove that for such a process {X„}„gN, we have 



where f{n) is any function such that f{n) = o{\/n) holds. The results of Theorem 3 then follow by noting 
that P(Zoo = 0) = I{W) and P(l - Zoo = 0) = P(^oo = !) = !- I{W) hold, and by substituting 
t = Q^\R/I(W)) and t = Q-\R'/{1 - liW))), respectively, into (13). 

We prove (13) by showing the two inequaUties obtained by replacing the equality in (13) by inequality in 
both directions. As the first step we have: 
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(12) 



(c2) X^- < X, 



n+l- 




Lemma 6: Let {X^jneM be a random process satisfying (cl), (c3) and (c4). For any f{n) = o('^/n), 



liminf P ( X„ < 2-2->--'+*v/»?+^'"' ) > p(^x^ ^ o)g(t). 



Proof: Without loss of generality, we can assume that c in condition (c3) satisfies c > 2. Define the process 
{i„}„gN as L„ = logX„. From (c3), we have 

Ln < logc + 5„_iL„_i, 

and by applying the above relation recursively, for m < 71 — 1 we obtain 

H iog'^+ ( n ^0 ^™ 

j—rii I \i—rii / 

< ( {{n - m) log c + L^). (14) 

\i—m / 

Fix 13 G (0,E[logS']) and let 

m = (log n + log lege) (15) 
Conditioned on the event 2?„i(/3) = {Xm < 2^^^ }, by using (14) we obtain 

in < - Jl .Si J TO log c. 

\i—m / 

Let the event ■H^^^(t) be defined as 

^m^'(i) ^ " m)E[log 5] + t^{n-m)Y[\ogS] + f{n - m)|, 

where / is any function such that f{k) ~ o{\fk) holds. Conditioned on VmiP) and HH^^it), we have 



log(-L„) > log TO + log log c + (n-TO)E[log5] + t^/ {n - TO)V[log S] + f{n - to). 

Hence, 

P^log(-L„) > logm + loglogc+ (n-TO)E[log S] + {n - m)N[\og S] + f{n-m)^ 

> F{VM n H^i(i)) - P(i?,„(/3))P(Hr 

The last equality follows from the independence condition (c4). 

Note that taking the limit n 00 also implies 771 — > 00 and 71 — to —> 00 via (15). From Theorem 10 (in 
Appendix), we have lim„_yoo P(X',„(/3)) = V{Xoo = 0). We also have lim„^oo f'i'H'^^it)) Q{t) due to the 
central limit theorem for {logS";}. We consequently have 



liminf P log(-logX„) > 77E[log S] + ty/nY[\og S] + f{n) > V{Xoc = 0)Q{t) 



for any /(77) ~ o{y/n). 

The second step of the proof of (13) is to prove the other direction of the inequality. We have: 
Lemma 7: Let {X„}„gN be a random process satisfying (cl), (c2) and (c4). For any /(7i) = o{^fn). 



limsupP I X„ < 2-2-'-^'-v/»T./<., . ^ p^^^ ^ Q^^^^^^ 
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Proof: Let L„ ^ logX„. From (c2), for m < n — 1 we have 

> ( n 

and thus 

log(-i„) < ^log5, + log(-L„). (16) 

i—m 

Hence, for any fixed m and any 5 E (0, 1), 



limsupP (log(-L„) > ?iE[log S] + t^/nV[log S] + f{n) 
< limsuppf log(-L„) > nEpogS-] + t^/nY[log S] + f{n), X„, < s] 



+ linisupP \og{- Ln) > nE[\ogS]+t^/n^\\ogS] + f{n), Xm>6]. (17) 

n— >-oo \ 

The first term in the right-hand side of (17) is upper bounded as 



limsupP log(-L„) > nE[logS] + ty^nV[\ogS] + /(n), < S 

n— ^cxD \ 

i'^) ( , 

< limsupP 2J log S^ + log(-L,„) > nE[log S\ + tyj nN^^ogSX + /(n), X,„ < 5 

n— J-oo \ . 

^ %—ra 

Q{mx„, < s), 

where (a) follows from (16), and where (b) follows from (c4) and the central limit theorem. The second term 
in the right-hand side of (17) is upper bounded as 



limsupP log(-L„) > nE[log S] + t^/nY[log S] + f{n), X„, > S 



< limsupP (Xn < ^, X„ > S 
where (a) follows from (cl). Applying these bounds to (17), for any 5 E (0, 1), we have 



lim sup P ( log(-L„) > nE[log S] + t^/nV[logS] + f{n) 

< limsup |Q(i)P(X„, < ^) + P (x^ <\, Xm>5]\ 

< Q(t)P(Xoo < (5) + P (^X^ < ^, > (5 
= Q{t)f {Xoo < S). 

By letting 5 — > 0, we obtain the result. ■ 

C. Proof of Theorem 4 

Lemma 8: The MAP error probability of a linear code C over a BMS channel W is lower bounded by 
Ziyvf"^'-'- /4 where dnun is the minimum distance of C. 
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Proof: Within this proof, the notation P(- • • ) should be understood as generically denoting the probability 
of an event (• • •). Since the MAP error probability of a linear code over a BMS channel does not depend 
on transmitted codeword, we can assume without loss of generality that transmitted codeword is the all-zero 
codeword, which is denoted by 0. Let Y be the random variable corresponding to a received sequence when 
is transmitted and let P{y \ c) be the likelihood of a codeword c given a received sequence y. Since MAP and 
ML are equivalent for equiprobable codewords, the MAP error probability is lower bounded as 

P(U,,gc\{o} {P{Y I c') > P{Y I 0)}) > P(P(F I c) > P{Y \ 0)) 



(«) 1 , 



2 

~ A 

Here, c is an arbitrary codeword in the set C\ {0} and w{c) denotes its Hamming weight. Also W®™ denotes 
the m-parallel channel of W which has the following rule 

m 

W''^^{yT\x)^\{W{y.\x). (18) 

Step (a) follows from (1). ■ 
It should be noted that the lower bound Pe(W^®"'''''') > (l/4)Z(H/)2"'('=' in the proof of Lemma 8 is not 
asymptotically tight in terms of the conventional exponents. It is possible to obtain tighter lower bounds via 
more elaborate arguments as in [22, Chapter 4]. However, since we are only interested in behavior of double 
exponents, the above bound turns out to be sufficient for the purpose of proving Theorem 4. 

In order to prove Theorem 4, from Lemma 8 it is sufficient to prove that given any e > there exists an 
integer M e N such that for n > M, 

log,(d(n, R)) < nE^{G) + ./^^VjC) [q-' {j^^ + ^) ' 

where d{n, R) is the minimum distance of a polar code using the kernel matrix G, with block-length N = 
and rate R. Since a row weight of the generator matrix is an upper bound of the minimum distance for a linear 
code, and since the weight of the ith row of G"^" is equal to 0^=1 'Wi.{G), where ij is the jth digit of the 
^-ary representation of i — 1, it is therefore sufficient to prove that given any e > 0, there exists an integer 
7\f e N such that for a polar code of block-length N = > and rate R and set of chosen indices I, there 
exists i E I for which the inequality 

^2 logf (G) < nEUG) + V^^^UG) (q-' (j^) + ^) (19) 

holds. In the proof of Theorem 3, one can observe that the key idea is to apply central limit theorem for 
{logS*,! = log Z)b„ (G)}„gN- In the same sense, in order to prove Theorem 4 we consider the random process 
{logWB„{G)}n£fi in addition to {log (G)}„gN- Note that these processes are in general correlated since 
they are both coupled to the same process {Bn}neTi- These processes are equal with probability one in the 
special case where Di{G) = Wi{G) holds for all i G {0, 1, . . . , ^ — 1}. In the same manner as the proof 
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of Theorem 3, we move on to a more abstract setting, by introducing a random variable U taking values in 
[1, oo), for which we assume that the expectation and the variance of log U exist and are denoted by E[log U] 
and V[logJ7], respectively, and by letting [/„)}„gN be i.i.d. drawings of (5, C/), where 5* is defined as 

in Definition 5. Let {(X„, Sn, ?7n)}nGN be a random process such that {(X„, S'„)}„(=n satisfies the conditions 
(cl) to (c4) together with the additional condition (c5) for {Un}neti- 
(c5) Un is independent of X„i for m < n. 

It is easy to see that the stochastic process of the triplets {{Zn, DB^{G),WB„{G))}neN satisfies (cl) to (c5). 
We first note from the proof of Theorem 3 that for any generic process Sn, Un)}n€fi satisfying (cl) to 

(c5), the relation (13) holds for any function f{n) = o{^/n). We also claim that for real numbers v, t such that 
V > t and for any function g{n) = o{^/n) we have 

/ 2"'^l'°'=Sl+tV"V[logS] + /Cr.) 

lim sup P I A „ < 2 , 

n— f oo \ 

ra-1 



log U, > nE[log U] + v^nY[\ogU] + g{n) < P(Xoo = O)g(t). (20) 



Using the relations (13) and (20) it is easy to see that for generator matrices of polar codes with rate R, the 
number of rows satisfying (19) is asymptotically proportional to the block- length, and hence there exists at 
least a row satisfying (19). We now turn to the proof of (20). 

Lemma 9: Let {(X„, Sn, f/n)}n6N be a random process satisfying (cl) to (c5). For any f{n) ~ o{y/n) and 
g{n) = o{y/^), 

n-l 



_2nE[logS]+VnV[log S] + /(n) 

i=0 



lim P Xn < 2-2"'^'-^'-v-uog.j.n.,^ V log (7, > nE[\ogU] + v^^N\^] + g{n) 

^F{Xoo^O)P{As>t,Au>v), 



where (AsjAjj) are Gaussian random variables of mean zero whose covariance matrix is equal to that of 

/ log5-E[log 5"] log [/-E [log U] \ 

[ ^Y[\ogS] ' VV[logC/] ; ■ 
The proof of this Lemma is the same as the proofs of Lemma 6 and Lemma 7. The difference is that the 
central limit theorem is replaced by the two-dimensional central limit theorem. From P{As > t, Ajj > v) < 
Q{ma.x{t, v}), the relation (20) is obtained for v > t. This completes the proof of Theorem 4. 

Remark: Let G = [ } ? ] • For this choice of G, we have Wi{G) = Di{G) for j = and 1. Hence, the random 
variables 5„ = Db„{G) and C/„ = wb„{G) are equal for n e N. Also note that Sn takes its value in the set 
{1,2} uniformly at random. From the proof of Theorem 4, the set of indices of the rows of polar codes with 
the kernel matrix G and rate R correspond to the event 

Xn < 2-2 ™' 

Also, with the same G, the set of indices of a RM code with rate R' correspond to the event 

|'^log!7, > nE[logU]+Q-\R')^nY[\ogU]+g{n)^ . 

From Lemma 9, it is easy to conclude that the fraction of the common chosen row indices of G**" between 
polar codes of rate R and RM codes of rate R' tends to I{W) niin{-^^, R'} as n ^ oo. 
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IV. Selection rule of rows 

The proof of Lemma 6 suggests a way to help us select the good indices in a more computationally efficient 
way. In the proof, £-ary expansion of row indices of G"^" corresponds to realizations of Bi, . . . , Bn- The 
proof of Lemma 6 implies that it is sufficient to select rows in I?„i(/3) n 7i"^^{t) in order to achieve the 
asymptotically optimum performance. It should be noted that the event applied to the Bhattacharyya 

process {Z„ = Z{Wn)}nefi! of W depends on the channel W, whereas the event 'Hll^^{t) is channel- 
independent. This observation leads to the following selection rule: The first in = s{n) = (logn + loglogc)//3 
digits of the row indices are determined in the channel-dependent way. Then, the following {n — m) digits 
are determined in the RM way, i.e., those combinations of digits {Bm, Bn-i) giving large values of 
^"J^ log _Db.(G') are selected. In this rule, only the first 6(logn) digits should be determined depending on 
the channel. 

The above argument can further be extended in a recursive manner Let C,JJj~^(e) = {{n—m)~^J2i=m^ogSi > 
EpogS*] - e}. Then, it is sufficient to select rows in Vrn„{l3) n C'^^^'^ie) n W^^^{t) where mi = s{n) and 
Too s(mi) since and VmaiP) n C;;'i-i(E[log 5] - /3) are asymptotically equal. (Use Cr~^{e) 

instead of H^^^{t) in the proof of Lemma 6. A similar argument can be found in [1, Section IV-B].) From 
this observation, only 8 (log log 7i) digits have to be determined depending on the channel. By iterating this 
argument, we obtain the selection rule in which only 

k 

e(br^") (21) 

digits depend on the channel for any k E N. From the argument so far, we deduce that even though the behavior 
of Zn — Z{Wn) depends on the channel W as well as the whole sequence {i?0 5 ^ii • • ■ 7 ^n-i}, the "fate" 
regarding whether it approaches or 1 when n is large, is mostly determined by the channel W and a prefix 
of {i?Oj Bi, . . . , Bn^i} with a relatively small length. Thus, to choose the indices of the channels W^l} that 
have the best quality, the first sublinear number of significant bits of the ^-ary expansion of i — 1 are determined 
depending on the channel and the rest are determined in a RM-like fashion. It should be noted that the above 
argument is valid in the large-ri asymptotics. It does not mean that one can make the number of digits to be 
determined in the channel-dependent manner arbitrarily small. 

Although the good indices of the rows of G®" can be selected using density evolution [3], in practice storage 
and convolution of probability density functions is exponentially (in block-length TV) costly in terms of memory 
and computation. Recently, several authors have considered accurate and efficient implementation of the density 
evolution procedure [23], [24]. The above-mentioned construction rule can be useful in reducing the number 
of convolutions and the number of levels in the quantization of channels. 
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Theorem 10: Let {Xn E (0, l)}„eN be a random process satisfying (cl) and (c3). For any fixed /? € 



Remark: Although Theorem 10 has already been stated for Bhattacharyya processes {Z„}„gN in [2], [5], we 
would nevertheless like to confirm that the result is obtained by using only the two conditions (cl) and (c3). 
Proof of Theorem 10: As the inequality 



Appendix 



(0,E[log5]), 




limsupP X„ < 2 




OO 



0) 
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obviously holds, a proof of the lower bound 

liminf P (Xn < 2-^^") > P(Xoo = 0) 

n— f oo V / 

is given in the following. Fix e G (0, 1). Let { J„}„gN be the random process defined as 

log(-logX„), forn = 0, ...,m 

log(S'„_i - e) + J„_i, for n > TO, 

which is to be used for deriving a probabilistic bound for {X„}„gN- Let 7^ (7) = {Xi < 7, for i = m,m + 

1, . . . , n}. Fix k e {1, 2, . . .}. From (c3), conditioned on T^^^'^^^ {c^^^'^), the inequality log(— logX„) > J„ 

holds for n ~ in,m + 1, . . . ,m + k. For the process { J„}rieN, the inequality 

rn-\-k— 1 



J,n+k = Jrn + ^ ^Og{S^ - e) 



i—7n 
7n-\-k— 1 



> Jrn+ J2 (l0g^»+l0g(l-e)) 

i—7n 

holds since Si > 1. This inequaHty immediately implies the following conditional bound: Conditioned on 

C'n+'-He) ^ {(1/fc) ETJn'r' ^ogS, > E[\ogS] - e}, one has 

Jm+k > J,n + fc(IE[log S]-e + log(l - e)). 
We have therefore obtained a probabilistic bound of log(— \ogXm+k) of the form 

P(log(- logX,„+fe) > J,n + fc(E[log S]-e + log(l - e))) 

> P (7;™+^-^(c-i/^) nC™+^-i(e)) 

for any m G N, fc G N and e > 0. From the law of large numbers, limfc^oo P {Cm^'^^^i^)) = 1- From (cl), 

lim,„_,oolimfc^ooP(7;™+^-i(c"'/')) > IP(^oo < c-i/^). Hence, 
lim inf lim inf 

P(log(- logX^+k) > J,n + fc(E[log5] - e + log(l - e))) 

> FiXoc. < c-^l') > P(Xoo - 0) 

holds for any e > 0. On the other hand, we observe that 

liminf P ( - log(- logX„) > E[logS'] - 7 
ri— i-oo yn 

> liminf 

P(log(- logX™+fc) > J,n + fc(E[log5] - e + log(l - e))) 
holds for any fixed to G N and 7 > 0(e) = e — log(l — e). Hence, 

liminf P ( - log(- logX„) > E[log S] - j 

Tl— i-oo Y 72 

> P(Xoo = 0) 

for any 7 > since 0(e) > for e > and lime^.o 0(e) = 0. 
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